On-line soft error correction in matrix-matrix multiplication

نویسندگان

  • Panruo Wu
  • Chong Ding
  • Longxiang Chen
  • Teresa Davies
  • Christer Karlsson
  • Zizhong Chen
چکیده

Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation results cannot be trusted any more. A well known technique to correct soft errors in matrix–matrix multiplication is algorithm-based fault tolerance (ABFT). While ABFT achieves much better efficiency than triple modular redundancy (TMR) – a traditional general technique to correct soft eywords: lgorithm-based fault tolerance atrix multiplication ault tolerant linear algebra n-line algorithm based fault tolerance errors, both ABFT and TMR detect errors off-line after the computation is finished. This paper extends the traditional ABFT technique from off-line to on-line so that soft errors in matrix–matrix multiplication can be detected in the middle of the computation during the program execution and higher efficiency can be achieved by correcting the corrupted computations in a timely manner. Experimental results demonstrate that the proposed technique can correct one error every ten seconds with negligible (i.e. less than over 1%) performance penalty

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Algorithm-Based Secure and Fault Tolerant Outsourcing of Matrix Computations

We study interactive algorithmic schemes for outsourcing matrix computations on untrusted global computing infrastructures such as clouds or volunteer peer-to-peer platforms. In these schemes the client outsources part of the computation with guaranties on both the inputs’ secrecy and output’s integrity. For the sake of efficiency, thanks to interaction, the number of operations performed by th...

متن کامل

Error correction in fast matrix multiplication and inverse

We present new algorithms to detect and correct errors in the product of two matrices, or the inverse of a matrix, over an arbitrary field. Our algorithms do not require any additional information or encoding other than the original inputs and the erroneous output. Their running time is softly linear in the number of nonzero entries in these matrices when the number of errors is sufficiently sm...

متن کامل

Matrix-Vector Multiplication via Erasure Decoding

The problem of fast evaluation of a matrix-vector product over GF (2) is considered. The problem is reduced to erasure decoding of a linear error-correcting code. A large set of redundant parity check equations for this code is generated. The multiplication algorithm is derived by tracking the execution of the message-passing algorithm on the obtained set of parity check equations. The obtained...

متن کامل

Autotuning Gemms for Fermi *

In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Comput. Science

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013